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The gene encoding the spike protein of the OC43 strain 
of human coronavirus (HCV-OC43) was cloned and 
sequenced. The complete nucleotide sequence revealed 
an open reading frame of 4062 nucleotides encoding a 
protein of 1353 amino acids with a predicted M t of 
150078. Structural features include 22 /V-glycosvlation 
sites, an N-terminal hydrophobic signal sequence of 17 
amino acids, an hydrophilic cysteine-rich sequence of 35 
amino acids near the C terminus, and a potential 


proteolytic cleavage site (RRSR) between amino acid 
residues 758 and 759, yielding SI and S2 segments of 
84730 and 65 366 M r , respectively. The predicted amino 
acid sequence of the spike protein of HCY-OC43 has 
91% identity with that of the Mebus strain of bovine 
coronavirus, revealing more sequence divergence in the 
putative bulbous part (SI) than in the predicted stem 
region (S2). 


Human coronaviruses (HCV) are enveloped positive- 
stranded RNA viruses that cause respiratory infections 
and have been associated with gastrointestinal and 
neurological disorders (Jouvenne et al., 1992; McIntosh, 
1974; Macnaughton & Davies, 1981; Murray et al., 
1992; Resta et al, 1985; Stewart et al, 1992; Talbot & 
Jouvenne, 1992; Tyrrell, 1986). They are categorized into 
two major antigenic groups, represented by the prototype 
strains 229E and OC43 (Macnaughton et al., 1981; Wege 
et al., 1982). 

The HCY-OC43 virion is composed of four structural 
proteins. Three of them are transmembrane proteins: 
spike (S), membrane (M) and haemagglutinin-esterase 
(HE). The fourth protein is an internal nucleocapsid (N) 
protein, possibly associated with the internal portion of 
the M protein (Sturman et al., 1980). The N protein 
binds to the virion RNA, forming the nucleocapsid of 
the virion (Baric et al., 1988). Both coronavirus glyco¬ 
proteins (S and M) are synthesized in the endoplasmic 
reticulum on membrane-bound ribosomes (Nieman et 
al., 1982). The integral membrane M protein interacts 
with the viral nucleocapsid and is believed to play a role 
in determining the intracellular site of virus budding. The 
S glycoprotein mediates binding of virions to the host 
cell receptor (Williams et al., 1991; Delmas et al., 1992; 


The nucleotide sequence data reported in this paper have been 
submitted to the EMBL and GenBank Nucleotide Sequence Databases 
under accession number L14643. 


Yeager et al., 1992), possesses a fusogenic activity, is the 
major target for antiviral neutralizing antibodies (Spaan 
et al., 1988; Daniel & Talbot, 1990) and can also be 
recognized by T lymphocytes (Korner et al., 1991). 
During maturation and intracellular transport, some S 
molecules are cleaved by host cell proteases probably 
located in the Golgi apparatus to yield two large subunits 
called SI and S2 (Frana et al, 1985). Primary sequence 
analysis suggested that the bulbous part of the S protein 
is formed by the N-terminal half of the molecule, SI 
(Cavanagh, 1983; de Groot et al., 1987 a). The C- 
terminal half of the S molecule, S2, is anchored in the 
virion envelope and is predicted to form an intrachain 
coiled-coil structure via heptad repeat patterns which 
would give it an elongated stem-like structure (de Groot 
et al., 1987 a). 

HCV-OC43-infected cells contain a genomic-sized 
viral mRNA plus eight subgenomic viral mRNA species 
(Mounir & Talbot, 1993). These mRNAs are arranged in 
a 3'-coterminal nested-set structure, in which the se¬ 
quence of every mRNA is contained within the sequence 
of the next larger mRNA (Lai, 1990) and each mRNA 
possesses a leader sequence identical to the 5' end of the 
genome (S. Mounir & P. J. Talbot, unpublished). 

The nucleotide and deduced amino acid sequences of 
structural and non-structural proteins as well as the 
leader sequence of HCV-OC43 have been determined 
(Kamahora et al, 1989; Zhang et al, 1992; Mounir & 
Talbot, 1992, 1993). Given the key biological importance 
of the S glycoprotein in coronavirus pathogenesis and its 
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GGCXGCAIGAXGCXXAGACCAXA AICXAAAC AXGXXIXXGAXACXXXIAAXXICCXXACCAACGGCXIXXGCTGXXAIAGGAGAXITAAAGXGXACXXCA 100 

* 14_£-li_I_L—l I_ s L P _I h £ —A V _D L K C T S 23 

GAXAAXAXIAAXGAXAAAGACACCGGXCCXCCXCCXAIAAGXACXGAXACXGXIGAIGXIACIAAXGGIXIGGGXACIIAIXAIGXIXIAGAICGXGXGI 200 
DNINDKDXGPPPISXDXVDVINGLGXYYVLDRVY57 

AIIXAAAXACIACGXIGXXICXIAAXGGIIAIXACCCIACXXCAGGXXCCACAIAXCGXAAXAIGGCACXGAAGGGAAGIGIACXAXIGAGCAGACIAIG 300 
LNXILFLNGYYPXSGSIYRNMALKGSVLLSRLW90 

O 

GXXIAAACCACCAXITCIXXCXGAXIIXAXXAAXGGXAXXITIGCXAAGGXCAAAAAXACCAAGGXXAXXAAAGAXCGXGXAAIGIAXAGXGAGIXCCCX 400 
FKPPFLSDFINGIFAKVKNTKVIKDRVMYSEFP 123 

GCIAIAACXAIAGGIAGIACXXIXGIAAAXACAXCCXAXAGXGIGGIAGXACAACCACGIACAAXCAAIXCAACACAGGAXGGIGAXAAXAAAIXACAAG 500 
AITIGSTFVNTSYSVVVGPRTINSTQDGDNKLQG 157 

o o 

GICXXXXAGAGGICXCIGXIIGCCAGXAXAAXAXGIGCGAGTACCCACAAACGAXIXGTCAICCIAACCIGGGIAAICAICGCAAAGAACXATGGCAXII 600 
LLEVSVCQYNMCEYPQXICHPNLGNHRKELWHL 190 

GGAXACAGGIGXIGIXTCCIGIIXAXAXAAGCGTAAXTXCACAXAIGAIGXGAAIGCXGAXTATTXGTAXXXXCATTXXXAYCAAGAAGGIGGTACXIXX 700 
DXGVVSCLYKRNFXYDVNADYLYFHFYQEGGXF 223 

o 

lAlGCAXAXXXIACAGACACXGGXGIIGIXACXAAGXXTXXGXXYAAXGTTXAIllAGGCAXGGCGCXXXCACACXAXXAXGXCAXGCCXCXGACIIGTA 800 
YAYFIDXGVVXKFLFNVYLGMALSHYYVMPLXCN 257 

ATAGTAAGCIXACXIIAGAAIAXXGGGXTACACCTCXCACIXCIAGACAATAXIIACICGCIXTCAAXCAAGAXGGXAIIAXTXTTAAXGCYGAAGAXXG 900 
SKLTLEYWVXP LTSRQYLLAFNQDG I I FNAEDC 2 90 

XATGAGXGATXXXATGAGXGAGATXAAGXGXAAAACACAAXCXAXAGCGCCACCXACXGGXGTTXAXGAAXTAAACGGXXACACTGTXCAGCCAAXCGCA 100 0 
MSDFMSEIKCKIQSIAPPIGVYELNGYIVQPIA 323 

GAXGIXXACCGACGXAAACCXAAICIXCCCAAIXGCAAIAIAGAAGCIXGGCXTAAXGAXAAGXCGGXGCCCICICCAXXAAAXXGGGAACGIAAGACAI 1100 
DVYRRKPNLPNCNIEAWLNDKSVPSPLNWERKXF357 

XXICAAAXXGXAAXIXIAAXAXGAGCAGCCIGAXGXCXXXIAXXCAGGCAGACICAITIACXXGIAAXAAIAXIGAXGCTGCXAAGAXATAXGGIAIGIG 1200 
SNCNFNMSSLMSFIQADSFTCNNIDAAKIYGMC 390 
o 

IIXXXCCAGCAIAACIAIAGAXAAGIXXGCXAXACCCAAXGGCAGGAAGGXIGACCXACAAXIGGGXAAXIIGGGCXAXXIGCAGXCAIIXAACIAXAGA 1300 
FSSIXIDKFAIPNGRKVDLQLGNLGYLQSFNYR 423 

AXXGAIACXACXGCAACAAGIIGXCAGIIGXAXXAXAAIXXACCIGCIGCXAAXGXXXCXGXXAGCAGGXXXAAXCCIXCIACIXGGAAIAAGAGAIIIG 1400 
1DXIAISCQLYYNLPAANVSVSRFNPS1WNKRFG 457 

o o 

GXXIXAXAGAAGAIICXGIXXXIAAGCCICGACCIGCAGGXGXICXXACTAAICAIGAXGXAGIXIAXGCACAACACXGXXXCAAAGCICCXAAAAAXII 1500 
FIEDSVFKPRPAGVLTNHDVVYAQHCFKAPKNF4 90 

CIGICCGXGXAAAIXGAAIGGXXCGIGIGXAGGIAGIGGXCCIGGIAAAAATAAXGGIAIAGGCACXIGICCIGCAGGXACIAAXXAIXIAACXXGXGAI 1600 
CPCKLNGSCVGSGPGKNNGIGXCPAGXNYLXCD 523 

O 

AATIXGXGCACXCCXGAXCCXAXXACAXXXACAGGXACXXAXAAGTGCCCCCAAACXAAAXCXTXAGXIGGCAXAGGXGAGCACIGXXCGGGTCXXGCXG 1700 
NLCIPDPIXFXGXYKCPQXKSLVGIGEHCSGLAV 557 

TTAAAAGIGAIIAXXGXGGAGGCAAXXCIIGIACXXGCCGACCACAAGCAIXXXXGGGIXGGXCXGCAGACXCXIGXXIACAAGGAGACAAGXGXAAIAX 1800 
KSDYCGGNSCXCRPQAFLGWSADSCLQGDKCNI 590 

IIIIGCIAAXIIIAXXIIGCAIGAIGXIAAIAGIGGXCXXACIXGXICIACXGAXTTACAAAAAGCIAACACAGACAXAAIXCXXGGIGIIXGIGIXAAI 1900 
FANFILHDVNSGLXCSXDLQKANXDIILGVCVN 623 

XATGACCXCXAXGG1AT1XXAGGCCAAGGCAX1T1IGXIGAGGIXAAXGCGACXTATTATAATAGTTGGCAGAACCTTTTAXAXGATTCTAAXGGIAAXC 2000 
YDLYGILGQGIFVEVNAXYYNSW QNLLYDSNGNL657 

O 

XCIACGGXXIXAGAGACIACAXAAXAAACAGAACXX1XAXGAXXCGIAGTXGCIAIAGCGGICGIGXIXCXGCGGCCXIICACGCIAACICXXCCGAACC 2100 
YGFRDYIINRIFMIRSCYSGRVSAAFHANSSEP 690 

o o 

AGCAXXGCXAXXXCGGAAXAXXAAAXGCAACXACGXXXXXAAXAAXAGXCXXACACGACAGCXGCAACCCAXXAACXAXXXXGAXAGXXAXCXXGGXXGX 2200 
ALLFRNIKCNYVFNNSLXRQLQPINYFDSYLGC 723 

O 

GIXGICAAIGCXXAIAAXAGXACIGCIAXXICTGIICAAACAXGXGAICXCACAGXAGGXAGXGGXXACIGXGIGGATXACICXAAAAACAGACGAAGIC 2300 
VVNAYNSXAISVQXCDLIVGSGYCVDYSKNRRSR 757 
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GTGGAGCGATTACCACTGGTTATCGGTTTACTAATTTTGAGCCATTTACTGTTAATTCAGTAAACGATAGTTTAGAACCTGTAGGTGGTTTGTATGAAAT 2400 
GAITTGYRFTNFEPFTVNSVNDSLEPVGGLYEI 790 

t 

TCAAATACCTTCAGAGTTTACTATAGGTAATATGGTGGAGTTTATTCAAACAAGCTCTCCTAAAGTTACTATTGATTGTGCTGCATTTGTCTGTGGTGAT 2500 
QIPSEFTIGNMVEFIQTSSPKVTIDCAAFVCGD 823 

TATGCAGCATGTAAATCACAGTTGGTTGAATATGGTAGTTTCTGTGATAACATTAATGCCATACTCACAGAAGTAAATGAACTACTTGACACTACACAGT 2600 
YAACKSQLVEYGSFCDNINAILTEVNELLDTTQL 857 

TGCAAGTAGCTAATAGTTTAATGAATGGTGTTACTCTTAGCACTAAGCTTAAAGATGGCGTTAATTTCAATGTAGACGACATCAATTTTTCCCCTGTATT 2700 
QVANSLMNGVTLSTKLKDGVNFNVDDINFSPVL890 

O 

AGGTTGTCTAGGCAGCGAATGTAGTAAAGCTTCCAGTAGATCTGCTATAGAGGATTTACTTTTTGATAAAGTAAAGTTATCTGATGTCGGTTTTGTTGAG 2800 
GCLGSECSKASSRSAIEDLLFDKVKLSDVGFVE 923 

GCTTATAATAATTGTACAGGAGGTGCCGAAATTAGGGACCTCATTTGTGTGCAAAGTTATAAAGGCATCAAAGTGTTGCCTCCACTGCTCTCAGAAAATC 2900 
AYNNCTGGAEIRDLICVQSYKGIKVLPPLLSENQ 957 

AGATCAGTGGATACACTTTGGCTGCCACCTCTGCTAGTCTATTTCCTCCTTGGACAGCAGCAGCAGGTGTACCATTTTATTTAAATGTTCAGTATCGCAT 3000 
ISGYTLAATSASLFPPWTAAAGVPFYLNVQYRI 990 

TAATGGGCTTGGTGTCACCATGGATGTGCTAAGTCAAAATCAAAAGCTTATTGCTAATGCATTTAACAATGCCCTTTATGCTATTCAGGAAGGGTTCGAT 3100 
NGLGVTMDVLSQNQKLIANAFNNALYAIQEGFD 1023 

GCAACTAATTCTGCTTTAGTTAAAATTCAAGCTGTTGTTAATGCAAATGCTGAAGCTCTTAATAACITATTGCAACAACTCTCTAATAGATTTGGTGCTA 3200 
ATNSALVKIQAVVNANAEALNNLLQQLSNRFGAI 1057 

TAAGTGCTTCTTTACAAGAAATTCTATCTAGACTTGATGCTCTTGAAGCGGAAGCTCAGATAGATAGACTTATTAATGGTCGTCTTACCGCTCTTAATGC 3300 
SASLQEILSRLDALEAEAQIDRLINGRLTALNA 1090 

TTATGTTTCTCAACAGCTTAGTGATTCTACACTGGTAAAATTTAGTGCAGCACAAGCTATGGAGAAGGTTAATGAATGTGTCAAAAGCCAATCATCTAGG 3400 
YVSQQLSDSTLVKFSAAQAMEKVNECVKSQSSR 1123 

ATAAATTTCTGTGGTAATGGTAATCATATTATATCATTAGTGCAGAATGCTCCATATGGTTTGTATTTTATCCACTTTAGTTATGTCCCTACTAAGTATG 3500 
INFCGNGNHIISLVQNAPYGLYFIHFSYVPTKYV 1157 

TCACAGCGAGGGTTAGTCCTGGTCTGTGCATTGCTGGTGATAGAGGTATAGCTCCTAAGAGTGGTTATTTTGTTAATGTAAATAATACTTGGATGTACAC 3600 
TARVSPGLCIAGDRGIAPKSGYFVNVNNTWMYT 1190 

o 

TGGTAGTGGTTACTACTACCCTGAACCTATAACTGAAAATAATGTTGTTGTTATGAGTACCTGCGCTGTTAATTATACTAAAGCGCCGTATGTAATGCTG 3700 
GSGYYYPEPITENNVVVMSTCAVNYTKAPYVML 1223 

O 

AACACTTCAATACCCAACCTTCCTGATTTTAAGGAAGAGTTGGATCAATGGTTTAAAAATCAAACATCAGTGGCACCAGATTTGTCACTTGATTATATAA 3800 
NTSIPNLPDFKEELDQWFKNQTSVAPDLSLDYIN 1257 

D O O 

ATGTTACATTCTTGGACCTACAAGTTGAAATGAATAGGTTACAGGAGGCAATAAAAGTCTTAAATCAGAGCTACATCAATCTCAAGGACATTGGTACATA 3900 
V T F L D Y...I...N...A K D I G T Y 1290 

o 

TGAATATTATGTAAAATGGCCTTGGTATGTATGGCTTTTAATCTGCCTTGCTGGTGTAGCTATGCTTGTTTTACTATTCTTCATATGCTGTTGTACAGGA 4000 
E Y Y V K W P W Y V W L L T C L A G V A M _ L_ Y ___L _E_E-1 C C C T G 1323 

TGTGGGACTAGTTGTTTTAAGAAATGTGGTGGTTGTTGTGATGATTATACTGGATACCAGGAGTTAGTAATCAAAACTTCACATGACGACTAAGTTCGTC 4100 
CGTSCFKKCGGCCDDYTGYQELVIKTSHDD* 1353 

TTTGATTCATTGCACTGATCTCTTGTTAGATCTTTTTGCAATCTAGCATTTGTTAAAGTTCTTAAGGCCACGCCCTATTAATGGACATTTGGAGACCTGA 4200 

M D I W R P E 
12-9K 


Fig. 1. Complete nucleotide sequence of the S protein gene of HCV-OC43 and its deduced amino acid sequence. The intergenic 
consensus sequence is doubly underlined. Potential iV-glycosylation sites (°) are indicated. The N-terminal signal sequence and C- 
terminal anchor domain are singly underlined. The putative proteolytic cleavage site is indicated by an arrow. Dashes indicate the 
leucine zipper motif. Asterisks indicate termination codons. The conserved KWPWYVW motif preceding the transmembrane domain 
is thickly underlined. 


interaction with the immune system, as well as the 
medical importance, both known and suspected, of 
human coronaviruses, structure-function studies of the S 
protein are highly important. As a first step, we now 


report the nucleotide sequence of the S protein gene of 
HCV-OC43 and compare it with the S protein gene of 
the closely related bovine coronavirus (BCV). 

The origin and cultivation of the HRT-18 cells and the 
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OC43 strain of HCV as well as the preparation, reverse 
transcription and PCR amplification of viral RNA were 
described previously (Mounir & Talbot, 1992). Poly(A)- 
containing RNA was selected with the PolyATtract 
mRNA isolation system (Promega) according to the 
manufacturer’s instructions. 

Four S gene-specific primers were designed for cDNA 
synthesis and PCR amplification of the HCV-OC43 S 
gene, based on the high degree of genomic similarity 
between HCV-OC43 and BCV (Mounir & Talbot, 1992, 
1993). The sense S1H primer represented the sequence 5' 
GCT GC AT GAT GCTT AG AC 3' (nucleotides 2 to 19, 
Fig. 1), the sense S2H primer was 5' GCG ATT ACC ACT - 
GGTTATCGG 3' (nucleotides 2306 to 2326, Fig. 1) 
corresponding to the sequence downstream of the 
putative proteolytic cleavage site; the antisense SI I 
primer was 5' CCGATAACCAGTGGTAATCGC 3' 
(nucleotides 2306 to 2326, Fig. 1) and the antisense S2I 
primer 5' GGGCGT GGCCTT A AG AAC 3' (nucleotides 
4158 to 4175, Fig. 1). Tandem IscoRl sites were added at 
the 5' end of each oligonucleotide for cloning purposes. 

Different purified PCR products were cloned into the 
pBluescript II SK + vector (Stratagene). Unidirectional 
deletions of the inserts were created using exonuclease III 
and mung bean nuclease (Pharmacia). Sequencing was 
performed on both strands of the PCR products by the 
dideoxyribonucleotide chain termination method (Sanger 
& Coulson, 1975), with universal, reverse or specific 
primers corresponding to various regions of the S gene, 
using T7 DNA polymerase (Pharmacia) and [ 35 S]dATP 
(Amersham). Sequence analyses were performed as 
described previously (Mounir & Talbot, 1992). 

The complete nucleotide sequence of the HCV-OC43 S 
gene and its predicted amino acid sequence are shown in 
Fig. 1, together with some structural features. The 
sequence begins 15 nucleotides downstream of the 
termination codon of the HE gene (Zhang et al., 1992). 
A single open reading frame (nucleotides 32 to 4093) 
encodes a polypeptide of 1353 amino acids (aa), with a 
predicted M r of 150078. The sequence UCUAAAC at 
nucleotides 25 to 31 is identical to the conserved 
intergenic sequence of BCY (Abraham et al., 1990), 
murine hepatitis virus (MHV) strain A59 (Luytjes et al., 
1987), MHY-JHM (Schmidt et al., 1987), and almost 
identical to the ACUAAAC sequence found in trans¬ 
missible gastroenteritis virus (Rasschaert & Laude, 
1987), porcine respiratory coronavirus (Rasschaert et al., 
1990), and feline infectious peritonitis virus (de Groot et 
al., 19876). The deduced aa sequence of the HCV-OC43 
S protein contains 22 potential Y-glycosylation sites, 13 
in SI and nine in S2 (Fig. 1). 

The HCV-OC43 S protein shares several properties 
with S proteins of other coronavirus S proteins. The first 
initiation codon at nucleotides 32 to 35 is followed by a 


potential signal peptide with a possible cleavage site (von 
Heijne, 1984) between aa residues 17 and 18. There are 
17 hydrophobic residues near the C terminus (aa 1302 to 
1318, Fig. 1) that represent the transmembrane domain. 
A stretch of eight aa (KWPWYVWL, aa 1295 to 1302; 
Fig. 1) of unknown function is found in all coronavirus 
S proteins sequenced to date (Britton, 1991). A leucine 
zipper motif terminates 10 amino acid residues upstream 
of this conserved KWPW motif located next to the 
transmembrane domain (aa 1270 to 1284, Fig. 1). It may 
be involved in the oligomerization of the S protein 
(Britton, 1991). A cysteine-rich hydrophilic C terminus 
of 35 aa (aa 1319 to 1353; Fig. 1), which is probably the 
intravirion domain, is also found in other coronavirus S 
proteins (Abraham et al., 1990; Schmidt et al., 1987; 
Binns et al., 1985; Luytjes et al., 1987; de Groot et al., 
19876; Rasschaert & Laude, 1987). 

The basic amino acid sequence RRSR (at positions 
754 to 757, Fig. 1) is located in the hydrophilic region of 
the molecule (data not shown). This sequence resembles 
the BCV (Mebus strain) cleavage site RRSRR (Abraham 
et al., 1990), the bovine enteric coronavirus FI5 strain 
RRSVR (Boireau et al., 1990) and the infectious 
bronchitis virus RRFRR (Binns et al., 1985). Cleavage of 
the S protein would divide the molecule into an N- 
terminal segment SI of 84730 M r and a C-terminal 
segment S2 of 65366. Assuming a mean M r of 2100 for 
addition at each Y-glycosylation site (Hunter et al., 1983) 
and the utilization of all sites, the mature S protein would 
comprise an SI moiety of 112030 and S2 of 84266, for a 
total M r of 196296. This corresponds to the observed 
sizes (Mounir & Talbot, 1992). Interestingly, most 
coronavirus S proteins, including that of the Mebus 
strain of BCV, possess two basic amino acids at the 
proteolytic cleavage site, whereas HCV-OC43 has only 
one (Abraham et al., 1990; Cavanagh et al., 1986 a; 
Luytjes et al., 1987; Schmidt et al., 1987). Cleavage sites 
of other viral surface proteins all contain one or two 
basic residues (Bosch et al., 1981; Dalgarno et al., 1983; 
Garoff et al., 1980; Paterson et al., 1984; Porter et al., 
1979; Rice & Strauss, 1981; Schwartz et al., 1983; 
Shinnick et al., 1981). C.p.e. can be observed upon 
infection of HRT-18 cells by BCV but not with HCV- 
OC43 (data not shown). It is tempting to speculate that 
the number of basic amino acids at the cleavage site may 
be involved in an efficient viral infection. 

As shown in Fig. 2, the S protein of HCV-OC43 is 
closely similar to the corresponding protein of the Mebus 
strain of BCV, with an identity of 91 %. The S proteins 
of both strains of MHV (MHV-A59 and -JHM) show 
only 62 and 59 % identity, respectively, with their HCV- 
OC43 counterpart (data not shown). 

The S protein of HCV-OC43 is composed of 1353 
residues, whereas the S protein of BCV contains 1363 
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OC43 MFLILLISLPTAFAVIGDLKCTSDNINDKDTGPPPISTDTVDVTNGLGTYYVLDRVYLNTTLFLNGYYPTSGSTYRNMALKGSVLLSRLWFKPPFLSDFI 100 
BCV .M.TVS. . .V. . .A.S_I.L.TL. 100 

OC43 NGIFAKVKNTKVIKDRVMYSEFPAXTIGSTFVNTSYSVWQPRTINSTQDGDNKLQGLLEVSVCQYNMCEYPQTICHPNLGNHRKELWHLDTGWSCLYK 200 
BCV .KG.H.T.L..I.T.H.K.V_W. 196 

OC43 RNFTYDTOADYLYFHFYQEGGTFYAYFTDTGVVTKFLFNVYLGMALSKYYVMPLTCNSKLTLEYWVTPLTSRQYLLAFNQDGIIFNAEDCMSDFMSEIKC 300 
BCV .TV.L_S.AM.K.V_V..K. 296 

OC43 KTQSIAPPTGVYELNGYTVQPIADVYRRKPNLPNCNIEAWLNDKSVPSPLNWERKTFSNCNFNMSSLMSFIQADSFTCNNIDAAKIYGMCFSSITIDKFA 4 00 
BCV ..L_S.I_D. 396 

OC43 IPNGRKVDLQLGNLGYLQSFNYRIDTTATSCQLYYNLPAANVSVSRFNPSTONKRFGFIEDSVFKPRPAGVLTNHDWYAQRCFKAPKNFCPCKLNGS-C 4 99 
BCV .R_T.QF....Q.V..F.H.S.D..L. 496 

OC43 VGSGPG-KNNGIGTCPAGTNYLTCDN-LCTPDPIT—FTGTYKCPQTKSLVGIGEHCSGLAVKSDYCGGNSCTCRPQAFLGWSADSCLQGD 586 

BCV . .N...IDAGY..S.H. AAQCNC.SKS..P.Y.I.P...Q.V. 596 

OC43 KCNIFANFILHDVNSGLTCSTDLQKANTDIILGVCVNYDLYGILGQG1FVEVNATYYNSWQNLLYDSNGNLYGFRDYIINRTFMIRSCYSGRVSAAFBAN 686 
BCV R.T.S.T.LT. 696 

OC43 SSEPALLFRNIKCNYVFNNSLTRQLQPINYFDSYLGCWNAYNSTAISVQTCDLTVGSGYCVDYSKNRRSRGAITTGYRFTNFEPFTVNSVNDSLEPVGG 78 6 
BCV ...T.S.D...SSV.TK_R.T. 796 

t 

OC43 LYEIQIPSEFTIGNMVEFIQTSSPKVTIDCAAFVCGDYAACKSQLVEYGSFCDNINAILTEVNELLDTTQLQVANSLMNGVTLSTKLKDGVNFNVDDINF 88 6 
BCV .E... 896 

OC43 SPVLGCLGSECSKASSRSAIEDLLFDKVKLSDVGFVEAYNNCTGGAEIRDLICVQSYKGIKVLPPLLSENQISGYTLAATSASLFPPWTAAAGVPFYLNV 986 
BCV .D.N.V.S.N.V.LS..V. 996 

OC43 QYRINGLGVTMDVLSQNQKLIANAFNNALYAIQEGFDATNSALVKIQAWNANAEALNNLLQQLSNRFGAISASLQEILSRLDALEAEAQIDRLINGRLT 1086 
BCV .I.D.S.Q. 1096 

OC43 ALNAYVSQQLSDSTLVKFSAAQAMEKVNECVKSQSSRINFCGNGNHIJSLVQNAPYGLYFIHFSYVPTKYVTARVSPGLCIAGDRGIAPKSGYFVNVNNT 1186 
BCV ...V .K. 1196 

OC43 WMYTGSGYYYPEPITENNVVVMSTCAVNYTKAPYVMLNTSIPNLPDFKEELDQWFKNQTSVAPDLSLDYINVTFLDLQVEMNRLQEAIKVLNQSYINLKD 1286 
BCV ..F.G.D_ I.T...H .D. 1296 

OC43 IGTYEYYVKWPWYVWLLICLAGVAMLVLLFFICCCTGCGTSCFKKCGGCCDDYTGYQELVIKTSHDD 1353 

BCV .GF.I.H. 1363 

Fig. 2. Amino acid sequence comparison of the HCV-OC43 S protein with that of BCV (Abraham et ai, 1990), by alignment for 
maximum identity. Dots indicate identical residues; hyphens represent gaps introduced into the sequence; the arrow indicates the 
putative proteolytic cleavage site. The analysis was performed with the GeneWorks 2.2.1 program (IntelliGenetics) using default 
settings. 


residues. This difference appears in the N-terminal SI 
region. The function of the additional sequence in BCV 
is not known. Sequence comparison between HCV- 
OC43 and BCV (Fig. 2) revealed more sequence 
divergence in SI than in S2. This observation is consistent 
with the model which suggests that the SI subunit forms 
the bulbous part of the S protein and S2 the stem region 
(Cavanagh, 1983; de Groot et al, 1987 a). Antigenic sites 
involved in virus neutralization have been identified in 
both SI and S2 (Daniel et ai, 1993; Luytjes et al., 1989, 
Stiihler et al., 1991; Talbot et al., 1988; Takase-Yoden et 
al., 1990; Vautherot etal., 1992; Cavanagh etal., 19866). 
The comparison of the amino acid sequences of HCV- 
OC43 and BCV S proteins indicates that these viruses 
arose from a common progenitor. 

Molecular studies of the HCV-OC43 S protein gene 
are important in the study of the interaction between the 
virus, the host cell and the immune system during 


infection. The study of the remainder of the genome of 
HCV-OC43 should provide important information on 
the replication, tropism, pathogenesis and evolution of 
this important human pathogen. Such studies are in 
progress. 
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